**Be prepared to debug a scan yield example:**

First thing I would do is to get as much information about the yield issue as possible:

* Does it affect Sail yield at different voltage corners?
* Is it the entire lot? Or just a few wafers?
* Does it affect both Sail 1 and Sail 2 in the same way?
* Is there any wafer regionality to the fails?
* Is there any slot level dependency?
* Are there any splits on the lot?
* What blocks are failing most often? Is there a pattern to them?
* What latches are on the blocks that are failing? Are they related?
* Where are the blocks located in the scan chain?
* What scan outs are failing more often on the block? Is it a specific scan chain that always fails?
* Do the block fails have regionality dependence?
* Do the block fails have voltage dependence?
* Does it look like a systematic or random defect?
  + Wafer mishandling? One time thing?
  + Wafer misprocessing? Tool issue?
  + How will it correlate to the product die?

Next I would look for correlations, do data mining, consult experts, while having regular updates to the rest of the team

If it seems to be a recurring problem I would set up a form of monitoring the defect, write a classification algorithm to automatically label this issue and also come up with some sort of metric to label how much of an impact to yield this specific defect has.

**Process optimization for yield and power/performance**

* P metal residual stuck in m026 nfet
  + can fix by making PC CD bigger (if Gate width is too small, it’s harder to etch, which makes a longer channel device, causing electrons move further (slower performance)
  + reliability issue, cause issue long term because hard etch, softer etch means leaving stuff behind.
* Taller Gate height results in less leakage and more conduction which means better performance in general. However, if the gate is too high, the CA contact above the gate is tapered so you can get closer to touching the contact for a short or leakage.

**Demonstrate knowledge of ATE tests**

* Stuck at 1, stuck at 0 test – feed in checkerboard twice inversed, slow speed, fails don’t affect next latches
* At speed (transition) tests – find slow to rise, slow to fall fails, feed in checkerboard and if the transition doesn’t happen or it happens at the wrong time, there’s a timing defect
* Path delay test -
* IDDQ Test – measure supply current (IDD) at the quiescent state (inputs are held at static value and circuit is not switching). Measure at different static states to see if there’s a defect that draws excess current.
* Toggle Test – not for defect detection. Try different circuit configurations fast, to make sure you can drive the latch to 1 or 0. Can be done fast so used for burn-in testing to cause high activity in the circuit.
* N-detect/ Embedded Multiple Detect (EMD)
* Deterministic Bridging
* Small-Delay Defects
* ABIST aka MBIST (Array built in self test) – scan in instructions and it spits out results
* LBIST (Logic built in self test)

In FA - Thermal detection, Photon Detection, shmoo many tests on a defect to see at what conditions it can actually pass park the chip at the boundary of pass and fail (scan a laser over to see what x,y location brings fail to a pass) <- CPA Critical Parameter Analysis

Latch up fails – catastrophic fails that turns on a stays on

Continuity test – short or open. Power supply set to 0v and pull current out of pad and measure voltage, see if the value is high or low to tell if short or open.

SRAM – ABIST, direct access to addresses to force write and read from pads. We have data pins

There’s a potential for addressing code issue where that’s broken and you only write to the same cells. BFM – have expected data output, log fails.

CPP space between gates (54 or 60 CPP) tight CPP means CA and PC come close and you have potential short and leakage issues. Extra 6nm in logic to avoid this.

**Advanced semiconductor technology node process integration knowledge**

* Physical structure, finfet, process limitations, PC CD, bridging, shifts, lithography related fails, CMP related, leaving liner behind, contacts – can’t dig deep enough to make contact. Stuff left over, or etched out. 2 different contacts CA – source drain, CB on the gate, different levels – how to not over etch or under etch

**WAT/eTest, Inline data analysis and correlation to yield**

* Wafer Acceptance Testing (WAT) also known as Process Control Monitoring (PCM) data is data generated by the Fab at the end of manufacturing and generally made available to the fabless customer for every wafer.

**Knowledge of Failure analysis techniques for speed path and systematic yield issue root cause**

**Scan ATPG and MBIST fail debug – scan diagnostics and bitmaps volume analysis**

* (Automatic test pattern generation)

**Think about impactful projects and your role in the process. What was the result?**

Ghost Fin

Situation – It was during the early stages of IBM’s 7nm microprocessor development, and we were seeing our Logic macro failing around the edge of the wafer. This was very concerning for the team because besides, the obvious yield loss, we don’t quite know how these kerf fails would translate to the product chip.

Task – As the person in charge of inline functional yield, I was responsible to classify and diagnose what the issue is so we can approach fixing it.

Action – I did a deep classification of the problem, going through my checklist of things to look at. Found the that it affected Sail 1 and Sail 2 differently. Found which blocks were failing and how it was only X8M and X3M latches. Found that only specific scan outs failed. I showed this to the guy in charge of running the tests because he has a deeper knowledge of what each of these latches look like and has the ability to look at the different layers in the design. We found that the areas where the scan chains are failing occur only where there is a 2 Fin with a 4 Fin RX below it. We had internal discussions and brought our findings to Samsung who performed an SEM and we see that there is some hazy residue around the gate causing a short between the gate and the metal contact. It was found to be in those specific areas because the 4fin RX structure would shift up on the top edge of the wafer and down on the bottom edge of the wafer, causing an incomplete etch on the gate, leading to the short. Because this was not an easy fix issue, I set up a classification algorithm to track the problem along with a metric to label how much of an impact this defect is causing so that we can know if we are improving.

Result – After different proposed fixes to the problem we eventually irradicated the problem except sometimes we still see it on the partial chips. However, since those are partial chips, it did not end up impacting our product yield.

**Fin Res**

Situation – in 14nm we were seeing fails seem to appear randomly from time to time. They weren’t dense fails but they spread pretty far and affecting our inline SRAM yield.

Task - As the person in charge of inline functional yield, I was responsible to classify and diagnose what the issue is so we can approach fixing it.

Action – I did a deep classification of the problem, going through my checklist of things to look at. Issue is not voltage dependent, it’s a hard fail. It hits many wafers to differing degrees but does not hit all wafers. It hits both SRAM macros the same way. It hit the bottom half of the wafer only. It was showing SCF/LoFBL fails only. This was the first classification model I built. I manually identified the impacted wafers at first and kept track of it on an excel spreadsheet. Then I figure out how to pull the data to jupyter notebook with SQL and wrote an algorithm based on everything we know about the defect. Then over time, I built a classifier with a random forest model with the wafers I’ve already labeled and the features from my in house algorithm. Using this new classifier’s outputs as a metric, I stacked up all the wafers with this fail and looked at which segments failed on each specific chip. It showed that only certain segments failed on these chips which should that there are hotspots. This suggests some thing physically happening to cause these fails, maybe a tool issue. Then I used my metric to rank how bad each wafer is in each lot and used that information to comb through all our process steps and found a slot order correlation at the FC RIE cut process step. Further digging showed that one of the 2 tools was causing this issue. We got TPLY on the fails and it showed residue left around the Fins causing a short from the Gate to the TS.

Result – We implemented a Foup purge and decommissioned one of the bad tools and the issue went away.

**Vmin Yield Loss Issue**

Situation – After some device shifts were made over time, we started seeing the Logic start failing at lower voltage corners.

Task - As the person in charge of inline functional yield, I was responsible to classify and diagnose what the issue is so we can approach fixing it.

Action – I did a deep classification of the problem, going through my checklist of things to look at. I see it only affects our lowest voltage corner, it affects the entire lot, sail 1 and 2 the same way, wafer regionality is the donut region where our device is hottest, no slot level dependency, we see the same behavior on Vt split lots where the pfet is hotter. The blocks failing are 1-6 and 26,27,28. No specific types of latches failing. These blocks are located at the left and right most sides of our logic macro. All scanouts are failing. I did correlations to find a very strong correlation between this fail and the idd standby current. This was a little bit tricky to figure out but I consulted with the test guy and we looked at the design of our logic macro and saw that we don’t have any pull up voltages at the edges of the macro. There is definitely some voltage drooping going on at the edge of the macros when the device gets too hot so those blocks fail at lower voltages.

Result – I put together my findings and suggested that this might be a problem specific to our logic macro. It shouldn’t impact our product but we should do a test just in case. We know that later at a higher metal level, the voltage is boosted a lot more so if we tested our logic macro there, we should see these Vmin fails go away. We did that and it confirmed our theory.

**Dashboards – Reliability Vmax, Inline Functional**

**Change Point Detection**

Situation – There was some issue on the product chip and we found that it flagged a device parameter earlier in the line.

Task – While discussing the issue, one of our senior engineers brought up the question, why didn’t we notice this sooner? No one was monitoring that parameter because we have so many. I suggested a way to comb through all the parameters to find shifts and change points

Action – Built change point detector

Result – not a lot of buy in because I had to do a lot of manual work. Look at the shifted parameters, put together a report and send it to experts to look at too much trouble. We had an intern come in and I had a second chance to put together this project. We automated the plots generated showing the shifts before and after, drew a line in the before and after showing medians. Labeled the date and lot the shift occurred. Generated table on a dashboard showing the parameter name, a label (device, functional, reliability, etc), shift date, shift lot, value before, value after. You can click on it and it’ll generate a graph.

**Think about projects that didn’t go as planned and how you may improve them the next time? What happened? What would you change?**

Unsupervised wafer clustering project

Change Point Detection – First pass didn’t think it through in terms of the customer

**When have you met a goal and/or exceeded that goal? Did you have any challenges?**

First Metric made for Fin Residue

**Think about times you’ve had to make business critical decisions independently. How did they affect the team/leaders/business?**

Our team doesn’t usually make decisions hastily when it comes to changes made to our devices. It takes discussion from different groups to make sure all of the impacts are understood and covered. However, since I’m in charge of inline functional yield. A lot of the time I have to make the call about whether a certain change would impact yield drastically or not.

M026 example about increasing the gate’s critical dimension (width). Yield improvement does not seem large enough and device speed impact is not yet known. However I have to argue from a potential reliability concern point of view even though that’s not my area of expertise, I have to make sure the point gets across so that all the information is on the table

We do a few split lots with small sample size. There was one notable case where it seems like the process change had a 5% impact on yield which is quite substantial but the sample size is still very low. Also I noticed a strong bias towards how the wafer was split. And it seemed like the few wafers with that change were all in the same lot and aggregated in the front. I had to argue with my team that we would need more data to verify this yield impact and that it might’ve been an unfortunate case of selection bias. Turns out the early wafers of that lot was split out and put into a different FOUP on a machine and that is what caused the degrade, independent to the change.

**Are there times you haven’t been satisfied with current situation or compromise? How did you handle?**

We do a few split lots with small sample size. There was one notable case where it seems like the process change had a 5% impact on yield which is quite substantial but the sample size is still very low. Also I noticed a strong bias towards how the wafer was split. And it seemed like the few wafers with that change were all in the same lot and aggregated in the front. I had to argue with my team that we would need more data to verify this yield impact and that it might’ve been an unfortunate case of selection bias. Turns out the early wafers of that lot was split out and put into a different FOUP on a machine and that is what caused the degrade, independent to the change.

**Questions**

How easy is it to access the data through SQL so that I can work with it with Python?

Are there any limits to the data you receive from the fab? My experience working in a fabless company is that sometimes there are a lot of limitations we can receive from the fab which might hinder how well I can do my job. Is there for example, data telling you which wafers went through which tool and process step at what time? The queue time or the entire process time, etc.

From your experience what is the company culture like right now. I know there are a lot of layoffs going on and I was wondering has that affected the working conditions at amazon.

What are the top 3 things that you look for most in your employees?